home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Tech Arsenal 1
/
Tech Arsenal (Arsenal Computer).ISO
/
tek-04
/
ad12.zip
/
REGULAR.TXT
< prev
next >
Wrap
Text File
|
1992-04-21
|
9KB
|
206 lines
ACTION DESIGNER
SEARCHING FEATURES
As of 4/20/92
The regular expression notation used by the searching routine is
very similar to the standard notation defined by the UNIX editor ed.
A regular expression (RE) specifies a set of character strings. A
substring in the searched string is said to be matched by the RE
if the substring is one of the character strings allowed by the RE.
If you have never used regular expressions before, the notation
is a bit arcane. You'll learn to love it. I won't give up my
Brief editor, even under Windows, until I find an editor with
comparable RE capability.
SIMPLE REGULAR EXPRESSIONS
ORDINARY CHARACTERS
One kind of ordinary character is a one-character RE that matches
itself.
The second kind of ordinary character is a two-character expression
that indicates that what would otherwise be a meta-character -- the
special characters described below -- is to be treated as an ordinary
character. The backslash is used as the first character in this
sequence.
BACKSLASH (\)
A backslash (\) followed by a meta-character is an RE that makes
the metacharacter into an ordinary character.
PERIOD
A period (.) is a one-character RE that matches any character.
CHARACTER CLASS
A non-empty string of characters enclosed in square brackets ([])
is an RE that matches any one character in that string.
[STV] will match either an S or a T or a V.
The following characters have special meanings within the square
brackets.
^ If the first character of the string is a circumflex (^),
the RE matches any character except what the RE would
otherwise match. The ^ has this special meaning only if
it occurs first in the string.
- The minus (-) may be used to indicate a range of consecutive
ASCII characters. For example, [0-9] is equivalent to
[0123456789].
The - loses this special meaning in the following cases:
if the - occurs first
the - occurs after an initial ^
the - occurs last in the string.
if the - is the first character after a range.
For example, [0-9-a] would be matched by any of
the digits from 0 thru 9, by a dash, or by an a.
if the - is the terminating character of a range.
For example, [+--a-z] would be matched by any of
the characters in the range + through - or
in the range a through z.
] The right square bracket (]) does not terminate a string
when it is the first character within it or after an
initial ^. E.g., []a-f] matches either a right square
bracket (]) or one of the letters a through f.
THE POSITIONAL REGULAR EXPRESSION
The positional regular expression is used to indicate where in a
line of text an operation is to occur. It is indicated by angle
brackets <> enclosing one or more numbers. For example:
<0> is an RE that matches the null string at position 0,
the beginning of the string.
<0,5,10> is an RE that matches the null string at position 0,
or the null string at position 5, or the null string
at position 10.
~ End Of Line Specification: If the position is preceeded
by a tilda (~), then the position is measured from the
end of the string.
<~0> matches the null string at the end of the string.
<~4> matches the null string at position 4 counting
from the end of the string.
- Range Specification: If two positions are separated by
a minus (-), a range of positions is used.
<0-5> matches any of the null strings at positions 0
through 5,
<5-~5> matches any null string from position 5 counting
from the beginning to position 5 counting from
the end.
In a range specification, the second position specified
must not occur before the first position specified.
<5-~5> will always fail to match in a string of 9
characters or less, since 5 positions from the
beginning occurs after 5 positions from the end.
<~0-~5> always fails. <~5-~0> is correct.
COMPLEX REGULAR EXPRESSIONS
The following rules may be used to construct REs from other REs:
* An RE followed by an asterisk (*) is an RE that matches
zero or more occurrences of the RE.
ab(ba)*cb searches for all occurences of ab followed by
zero or more occurences of ba followed by cb.
The patterns abbacb, abbabacb, abbabababacb,
and abcb would all be treated as matching this
RE.
ab(ba)* searches for all occurrences of ab followed by
zero or more occurrences of ba. If more than
one sequence of ba follows an ab in the text,
the match will be made to the entire sequence.
(ba)* will always match the beginning of the string.
+ An RE followed by a plus (+) is an RE that matches one
or more occurrences of the RE.
ab(ba)+ searches for all occurences of ab followed by
one or more occurrences of ba. If more than
one sequence of ba follows an ab in the text,
e.g., abbababa, the match will be made to the
entire sequence.
{} Replication Counts: An RE followed by {m}, {m,}, {,n} or
{m,n} is an RE that matches a range of occurrences of the
RE. The values of m and n must be non-negative integers.
{m} indicates exactly m occurrences of the RE.
{m,n} If m is less than n, then {m,n} indicates at
least m occurrences of the RE and no more than
n occurrences. In cases where the RE occurs
more than the minimum number of times specified
by m, the match will be made to the minimum
number.
ab(ba){2,4}: if abbabababa is found, the match
will be made to abbaba.
If m is greater than or equal to n, then {m,n}
indicates at least n occurrences of the RE and no
more than m occurrences. In cases where the RE
occurs more than the minimum number of times
specified by n, the match will be made to the
longest sequence up to and including the maximum
number specified by m.
ab(ba){4,2}: if abbabababa is found, the match will be
made to the entire sequence.
{m,} is equivalent to {m,infinity} and
{,n} is equivalent to {infinity,n}.
Consequently, * and + are
equivalent to {,0} and {,1} respectively.
$ Assignment: An RE followed by $c where c is a letter
matches whatever the RE alone would match. (Upper
and lower case are equivalent.)
The expression <c> where c is a letter is an RE which
matches whatever value is assigned to the character c.
If no previous assignment has been made, then it matches
the null string in any position.
| Alternation: REs separated by a vertical bar (|) form an
RE that will be matched by strings in the text that match
any of the REs that make up the complex RE.
(s|x|z) will be matched by either an s, an x, or a z.
() Grouping: An RE enclosed within parentheses is equivalent
in terms of what matches it to the same RE without the
parentheses.
CONCATENATION
REs may be concatenated together to form a single RE that will
be matched by the concatenation of the strings that matched the
previously separate REs.
PRECEDENCE
The suffix operators *, +, {}, and $, have the highest precedence.
Concatenation has next highest precedence. Alternation, |, has
the lowest precedence. The order of operation may be modified by
grouping with parentheses.